06. Cointegration

M2L6 09 Cointegration V6

Cointegration

A way to think about whether two stocks’ time series are cointegrated is to see if some linear combination of their time series forms a stationary series. In other words, let’s say stock_1 and stock_2 are non-stationary, but w_1 \times stock_1 + w_2 \times stock_2 is a stationary series. Then we can also say that stock_1 and stock_2 are cointegrated.

Hedge Ratio

We can perform a regression where stock_2 is the dependent variable, and stock_1 is the independent variable (it doesn’t matter which you choose to be x or y). Then the regression coefficient, which is our hedge ratio, is effectively \frac{w_1}{w_2} . You can see how multiplying stock_1 by \frac{w_1}{w_2} is similar to multiply stock_1 by w_1 and stock_2 with w_2 ; in either case, we’re weighting each stock so that their linear combination produces a stationary series.

Augmented Dickey Fuller Test

To check if two series are cointegrated, we can use the Augmented Dickey Fuller (ADF) Test. First, let’s get some intuition to see what the ADF test is doing. It’s trying to determine if the linear combination of the two series, (which is also a time series) is stationary.

A series is stationary when its mean and covariance are constant, and also when the autocorrelation between one time period and another only depends on the time duration between them, and not the specific point in time of each observation.

If you could represent a series as an AR(1) model y_t = \beta y_{t-1} + \epsilon_t , let’s think about what happens if the \beta is greater than one. We can imagine putting in a value for y_{t-1} to get an estimate for y_t ; then for the next day, we’ll use that value as y_{t-1} to put into the model and estimate the new y_t . We’d end up having a series that trends in one direction, so its mean is not constant, and therefore it is not stationary.

Next, if we had a \beta equal to one, then y_t = y_{t-1} + \epsilon_t . We call this special case a random walk, and it means that the current price is equal to the previous price plus some white noise. Even though the mean of this series is constant, its covariance between one time period and another depends upon the point in time of the observations, so it is also not stationary.

Finally, if we had a \beta of less than one, then we notice that y_t depends upon less than 100% of the value of its previous value y_{t-1} , with some added random noise \epsilon_t . The series doesn’t trend in a particular direction. Its variance is also constant, and its covariance between any two data points doesn’t depend on the point in time of the data point. You can think of the series like a bouncing rubber ball that’s being tapped lightly by random raindrops. Without the rain, the bouncing ball would have smaller and smaller bounces, and eventually stop bouncing. With random raindrops falling on the ball, some raindrops would make the ball bounce more, others would make the ball bounce less. So overall, the ball maintains a constant bounce height over time.

So conceptually, the Augmented Dickey Fuller Test is a hypothesis test for which the null hypothesis is that a series is a random walk (its \beta is equal to one), and so the null hypothesis assumes that the series is not stationary. The alternate hypothesis is that \beta is less than one, and therefore it’s a stationary series. So if the ADF produces a p-value of 0.05 or less, we can say with a 95% confidence level that the series is stationary.